Wellcome Open Research
○ F1000 Research Ltd
Preprints posted in the last 7 days, ranked by how well they match Wellcome Open Research's content profile, based on 57 papers previously published here. The average preprint has a 0.07% match score for this journal, so anything above that is already an above-average fit.
Mettananda, C.; Sivasumithran, K.; Ranaweera, L.; Madhubhashini, A.; Ranawaka, C.; Pathmeswaran, A.; Dassanayake, A.
Show abstract
Background The European Association for the Study of the Liver (ESAL) - Steatotic Liver Disease (SLD) screening algorithm involves two steps; initial screening with FIB-4 followed by referral for vibration-controlled transient elastography (VCTE) in patients likely to have significant fibrosis (SF). However, VCTE is not widely available in resource-limited settings. Aim To optimise the EASL SLD screening algorithm for resource-poor settings using machine learning (ML). Methods We analysed data from 964 adults aged [≥]35 years who underwent VCTE at a tertiary referral centre in Sri Lanka between November 2024 and 2025. Multiple ML models using different methods and variable combinations were trained on 80% of the dataset and tested on the remaining 20%. Best models were selected based on performance and externally validated using data from 430 patients who underwent VCTE before November 2024. Model performance was compared with the FIB-4 using confusion matrices. Results A Random Forest model incorporating age, AST, ALT, and platelet count separately, rather than using FIB-4, outperformed. The all-variable ML model showed the best predictive performance for SF, with accuracy of 77.2%, recall of 0.762, precision of 0.778, and AUC-ROC of 0.818. The variables used in the model, in descending order of feature importance, were AST, platelet count, BMI, ALT, age, diabetes mellitus, hypertension, dyslipidaemia, sex, family history, hypothyroidism, diabetes complication and smoking. External validation demonstrated 75.1% accuracy and an AUC of 0.779. When used as the first step of the SLD screening algorithm, the all-variable ML model identified 37 (17.1%) additional true positives and reduced false-negative diagnoses by 50% compared with FIB-4. Conclusions ML-based models were more effective than the FIB-4 score as the first-line screening tool for VCTE referral, substantially improving the identification of patients with significant fibrosis in this South Asian cohort.
Heller, D. J.; Elkersh, Y.; Nonterah, E. A.; Kuwolamo, I.; Horowitz, C. R.; Alvarez, E. E.; Awine, T.; Govindarajulu, U.; Squires, A. P.; Aborigo, R. A.
Show abstract
Introduction: Hypertension is the world's leading cause of death, and depression its leading cause of disability. Control rates for these noncommunicable diseases (NCDs) are low in low and middle-income countries (LMICs). Many LMICs have programs to screen and treat underserved communities for infectious diseases, but evidence to adapt them to treat NCDs is limited. We developed and tested a non-communicable disease program through Ghana's Community-Based Health Planning and Services (CHPS) primary care initiative. Methods: We trained 8 CHPS nurses to diagnose and treat hypertension and depression through door-to-door screening and pharmacotherapy. Physician assistants provided telehealth supervision. We combined this treatment with volunteer counseling to boost medication adherence, improve mood, and change health behaviors. We called the 90-day intervention the CHPS Opportunity for Mentally and Behaviorally Integrated NCD Engagement (COMBINE). Results: We recruited 60 adults from 580 screened: 37 with hypertension (mean blood pressure (BP) of 149/91 mm Hg) and 23 with depression (mean physician health questionnaire (PHQ-9) score of 13.3). After 90 days, 57/60 (95%) completed the intervention: 32/37 (86%) achieved blood pressure control (mean BP 122/75 mm Hg), and 19 of 20 (95%) achieved depression control (mean PHQ-9 score 2.0). After 12 months, 51/60 were retained: 33/37 with hypertension (89%) and 18/23 with depression (78%), with a mean BP of 121/75 and PHQ-9 score of 1.4 respectively. All 51 (100%) achieved disease control at 12 months. 5 persons left by migration and 4 by escalation to higher-level care. Conclusions: The COMBINE model achieved high levels of diagnosis, care retention, and disease control, with minimal adverse events, in a remote setting with limited usual NCD care. This model suggests a novel means to improve the care cascade for these and other noncommunicable diseases through existing non-physician care models in LMICs, warranting further controlled testing at scale.
Vidaletti, L. P.; Dos Santos, A. M.; Hellwig, F.; Barros, A. J. D.
Show abstract
Background: The traditional wealth index, based on principal component analysis (PCA), used in the Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS), suffers from urban bias, distorting estimates of health inequality. We compared the traditional index (PEAR1) with an alternative two-component polychoric PCA index (POLY2) using annual expenditure from 12 LSMS surveys as the gold standard to determine which provides more accurate SEP measures for equitable policy targeting. Methods: We compared the traditional wealth index (PEAR1) with a two-component polychoric PCA approach (POLY2) using 12 LSMS (Living Standards Measurement Study) surveys (2015-2022) from 12 African countries. Annual household consumption expenditure was the gold standard. We assessed agreement using weighted Cohen's kappa and validated against education (proportion of households with secondary or higher education) using the concentration index (CIX) and slope index of inequality (SII). Results: The POLY2 index showed higher agreement with expenditure quintiles (average national weighted kappa = 43.3%) than the PEAR1 index (35.1%), with notable improvements in urban (43.5% vs. 27.5%) and rural (35.3% vs. 22.4%) areas. POLY2 also attenuated extreme household distributions observed in PEAR1. Education validation showed that POLY2 produced intermediate inequality gradients between the flatter expenditure-based gradient and the steeper PEAR1-based gradient. Conclusion: The POLY2 wealth index is superior to the traditional index, reducing urban-rural bias and providing more accurate socioeconomic classifications. Its adoption in large-scale surveys such as DHS and MICS is recommended to improve equitable monitoring of health inequalities in low- and middle-income countries.
Wong, K.; Pitcher, D.; Masoud, S.; Tzoumkas, K.; Branson, A.; Oates, T.; Gear, S.; Russell, H.; RaDaR consortium, ; Francke, K.; Inan-Eroglu, E.; Abdelgawwad, K.; Liu, S.; Dasmahaptra, P.; Lin, J.; Mercer, A.; Hendry, B.; Lennon, R.; Turner, A. N.; Gale, D. P.
Show abstract
Abstract Background Alport Syndrome (AS), caused by pathogenic variants in type IV collagen genes COL4A3/4/5, is a leading monogenic cause of Kidney Failure (KF). Clinical course varies widely, and disease specific predictors of progression relevant to clinical care and trial design remain incompletely defined. Methods In this retrospective cohort study of individuals with AS in the UK National Registry of Rare Kidney Diseases, patients were classified as having AS or heterozygous genotypes and followed to assess proteinuria progression, eGFR slope and kidney survival. Proteinuria and eGFR trajectories were analysed using mixed effects regression models; kidney survival using Kaplan Meier analysis. Results Among 1032 participants (median follow up 11.6 years; 47% female), 475 (46%) had AS genotypes (Male XLAS or autosomal recessive AS). eGFR decline accelerated with advancing CKD stage across all genotypes (p<0.001). Proteinuria increased as eGFR declined and occurred earlier in AS genotypes. After reaching proteinuria thresholds of more than 1.0 and 3.0g/g, kidney survival over the subsequent 5 years did not differ significantly between genotypes (logrank p=0.14, p=0.17, respectively), although modest differences emerged over longer follow-up. Across eGFR thresholds (90, 60, and 45mL/min/1.73m2), higher proteinuria was associated with shorter time to KF; for example, at eGFR 45mL/min/1.73m2, median time to KF was 3.0 years (IQR, 1.6-5.4) for above-median vs 6.5 years (5.1-not estimable) for below-median proteinuria (p<0.0001). Almost all patients who reached KF had developed proteinuria of more than 0.3g/g. Conclusion In this national cohort, eGFR decline accelerated with CKD stage and proteinuria was strongly associated with progression to KF across genotypes. The non linearity of eGFR decline may inform its interpretation in clinical practice and use as a trial endpoint. Once comparable proteinuria levels were reached, differences in outcomes by genotype were attenuated, supporting proteinuria as a key prognostic marker and strengthening rationale for its use as a surrogate endpoint in AS clinical trials
Munyangi wa Nkola, J.; Akilimali Zalagile, P.; Lukuke Mbutshu, H.; Kabala Munyemo, S.; Ramazani Bin Eradi, I.; CAMARA, A.
Show abstract
Background: Artemisinin-based combination therapies remain the mainstay of malaria control strategies; nevertheless, the advent of genetic markers linked to partial artemisinin resistance in Plasmodium falciparum has elicited substantial concern across African settings. To assess the prevalence, geographic distribution, and clinical associations of these molecular markers, we undertook a systematic review and meta-analysis of observational cohort studies.Methods: We conducted a search of cohort studies published between January 2015 and June 2025, following PRISMA 2020 guidelines. We queried databases including PubMed/MEDLINE, Scopus, Web of Science, and CINAHL. Eligibility required prospective enrollment of patients, longitudinal monitoring (therapeutic efficacy studies), and pfkelch13 propeller domain genotyping.Results: A meta-analytical synthesis of 888 isolates from six core prospective cohorts revealed a pooled prevalence of 6% (95% CI: 2.1%-11.8%) for validated pfkelch13 mutations. A profound geographic dichotomy was identified: while West and Central African cohorts maintained a 0% prevalence, East African hotspots showed significant expansion, with prevalence reaching 12.8% in Rwanda and up to 25.5% in Northern Uganda; high statistical heterogeneity (, ) reflects this biological divergence. Conclusions: These findings highlight the established and expanding presence of artemisinin partial resistance in East Africa. Standardized surveillance is essential to adapt malaria control policies across the continent. Keywords: Africa; artemisinin resistance; clinical indicators; pfkelch13 gene; molecular markers; partial resistance; Plasmodium falciparum.
Leonard, S. A.; Dysart, K.; Callahan, A.; Siadat, S.; Zhang, J.; Handley, S. C.; Huybrechts, K. F.; Igbinosa, I.; Bateman, B. T.
Show abstract
Background: Epic Cosmos is a relatively new centralized electronic health record dataset with high potential utility in perinatal epidemiologic research. Objectives: The study objectives were to develop replicable steps to create longitudinal, linked maternal-infant cohorts in Cosmos, assess completeness of key variables, evaluate potential selection bias with restrictions for longitudinal healthcare encounters, and provide an example epidemiologic analysis. Methods: We created maternal-infant cohorts by starting with live births during 2023-2024 recorded in the BirthFact data table and joining with additional data tables as needed. We selected and created variables for perinatal characteristics, common comorbidities, and routinely measured vital signs and laboratory values, and assessed variable completeness. We sequentially restricted the birth cohort for maternal-infant linkage and longitudinal healthcare from first-trimester prenatal care encounter through infant follow-up care within 12 weeks post-discharge from birth hospitalization. Finally, we conducted an example analysis of the association between high systolic blood pressure in the first trimester ([≥]140 mm Hg) and later onset of preeclampsia among those with chronic hypertension. Results: The total linked birth cohort included 2,624,186 pregnancies. Completeness was >90% for most variables assessed but was 77% for racial and ethnic group and 76% for body mass index at delivery. Characteristics of the cohort were similar to those reported for the entire United States birth population based on birth certificate data, including similar regional and racial-ethnic composition. Longitudinal cohort restriction requiring linked records from first trimester prenatal care through infant follow-up care reduced the cohort size to 509,148 pregnancies. However, restriction had minimal effects on cohort characteristics. In the example analysis, high systolic blood pressure was associated with increased risk of preeclampsia among those with chronic hypertension (aRR: 1.26; 95% CI: 1.22, 1.30). Conclusions: This study provides a rigorous and reproducible approach to creating longitudinal, linked maternal-infant cohorts in Epic Cosmos and the analytical findings suggest high data quality and representativeness.
Chen, F.; You, R.; Liu, Y.; Yin, Y.; Liu, A.; Deng, L.; Xie, B.; Fan, J.; Wang, W.
Show abstract
Background and Aims: MASLD has become the most prevalent chronic liver disease globally. Although MVPA and plasma fatty acids have been individually studied in relation to metabolic health, their independent and combined associations with MASLD incidence remain unclear. We aimed to investigate these associations. Methods: This study included 51,717 UK Biobank participants free of liver disease at baseline, with MVPA measured using wrist-worn accelerometers and plasma fatty acids quantified via NMR. Multivariable-adjusted Cox models and restricted cubic splines were used. Results: Over a median follow-up of 7.8 years, 472 incident cases were identified. In fully adjusted models, meeting recommended MVPA levels together with higher n-6 PUFA concentrations was associated with a 71% lower risk (HR 0.29, 95% CI 0.18-0.45). The MVPA-MASLD association was nonlinear, with risk reduction plateauing at approximately 189 minutes per week. Higher n-6 PUFA was associated with reduced risk, whereas n-3 PUFA showed no significant association. Conclusions: These findings suggest that behavioral and metabolic factors may jointly influence MASLD risk. Further studies in diverse populations are needed to confirm these associations.
Muddiman, R.; Donoghue, P.; Gomez Lemus, J.; Doherty, A. S.; Boland, F.; McCarthy, C.; Moriarty, F.
Show abstract
Purpose In deprescribing studies, a prescription-free gap is typically used to determine if patients discontinued their treatment. An appropriate gap depends on the typical time between prescriptions during continued use. This work aims to characterise the interval between prescriptions of chronic drugs using different methods for a cohort of older people in primary care in Ireland. Methods The empirical prescription interval was analysed for 38,154 patients for the twenty most common drug classes and the association between covariates and the interval was analysed using a multi-level model. Estimates were also compared to those obtained from the parametric waiting time distribution (pWTD) approach. Results Available covariates had consistent relationships with prescription intervals across drug classes. For example, each additional prescription issue was associated with an increase in the interval by 5.0 (NSAIDs) to 19.7 days ("Other antidepressants"). Full public health cover was associated with a -29.0 day (inhaled adrenergics) to -11.0 day (opioids) change relative to partial cover, while other/private cover had a -17.9 day (benzodiazepines and associated drugs) to -7.1 day (SSRI and SNRIs) change relative to partial cover. The pWTD also produced consistent estimates of the population interval for most drugs. Conclusions The interval varied substantially within drug classes, due to a mixture of patient, practice and unmodelled factors. Variation between practices was effectively explained, with residual variation between patients and within patients. The pWTD approach is useful for describing complex distributions of intervals, and may be more appropriate for inferring a gap than summarising truncated data.
Shukla, N.; Bartington, S. E.; Hansell, A. L.; Lucas, T. C.
Show abstract
Background: In the absence of high-resolution response data, exposure-response modelling often relies on aggregated low-frequency exposure data, leading to loss of high-resolution information. Mixed Data Sampling (MIDAS) from econometrics offers an alternative but is limited due to its inability to make high-resolution predictions, inflexible likelihoods and penalised nonlinear functions, and limited visualization options. We propose a mixed-frequency Distributed Lag Non-linear Model (mf-DLNM) which can eliminate the need to aggregate exposure data in environmental epidemiology and provide high resolution predictions for time series studies. Methods: We evaluated the inference and predictive performance of the mf-DLNM. To evaluate its ability to estimate exposure-response relationships, we applied mf-DLNM and same-frequency (sf)-DLNM using data from the West Midlands, UK. Additionally, we compared the predictive performance of mf-DLNM with sf-DLNM and MIDAS across nine regions of England. As MIDAS cannot predict at the resolution of the predictor (daily), we compared the predictive performance of mf-DLNM and MIDAS at weekly resolution. To test the model's ability to predict high temporal resolution risk (daily), we compared sf-DLNM (with access to daily mortality counts) with mf-DLNM (with access only to weekly mortality counts). Results: In the West Midlands example, mf-DLNM performed comparably to sf-DLNM in estimating daily risk of temperature on respiratory mortality. Furthermore, mf-DLNM and MIDAS exhibited similar performance for weekly predictions. For high-resolution predictions, mf-DLNM and sf-DLNM showed nearly similar performance, despite mf-DLNM having access only to low-resolution response data. Conclusion: This mixed-frequency approach in environmental epidemiology overcomes the limitations of predicting health risks using aggregated exposure data and provides estimates of high-resolution outcomes in the absence of high-frequency health outcome datasets.
Mamak, F.; Yu, Z.; Triozzi, J. L.; Corty, R.; Wheless, L.; Wang, G.; Giri, A.; Chen, H. C.; Wilson, O. W.; Bick, A. G.; Gaziano, J. M.; Tao, R.; Hung, A. M.
Show abstract
Importance: Recently, proteinuria has been accepted as a surrogate end point for clinical trials in focal segmental glomerulosclerosis (FSGS) ang IgA nephropathy. However, proteinuria has not been evaluated in Apolipoprotein L1 (APOL1)-mediated kidney disease (AMKD). Methods: Real world data (RWD) analysis of 128 patients of African ancestry with APOL1 high risk genotypes, without diabetes, enrolled in the Million Veteran Program (MVP; n=109) or the biorepository at Vanderbilt University (BioVU; n=19), who had urine albumin-creatinine ratio (UACR) >= 420 mg/g (PCR~0.9 g/g) with a concurrent GFR value. The main predictor was change in the log-UACR at 12 months. The primary outcome was annual GFR slope over 24 months. Secondary outcomes included a kidney composite of a sustained 30% GFR decline, end stage kidney disease (ESKD) or death and ESKD as a single outcome. Linear regression and Cox proportional hazards models were used to assess the effect of changes in UACR and the outcomes. Results: In the pooled analysis the mean age was 56.8 (SD 15.5) y, 116 were male (90.6%) and three patients had diagnosis of FSGS at baseline. Mean baseline eGFR was 46.8 (SD 16.1) mL/min/1.73m2, mean baseline UACR was 1240.8 (1107.7) mg/g, mean eGFR slope was -4.67[-6.00, -3.33] mL/min/1.73m2/year and the geometric mean percentage changes in the UACR at 12 months were -57.5% [-65.0%, -48.4%]. For every 1 unit of log (UACR) increment at 12 months, the annual eGFR slope decreased by -1.80 [-2.56, -1.03] mL/min/1.73m2 in the pooled analysis. For every 1 unit of log (UACR) increment at 12 months, the Cox regression showed a 61% increase in the risk of a kidney composite (p=0.002) and a 98% increase in the risk of ESKD (p<0.001). It was estimated that a 50% reduction of UACR at 12 months was associated with a 28% reduction in the kidney composite endpoint (adjusted hazard ratio [aHR]=0.72; 95% confidence interval [CI]:0.59-0.88; p=0.002), and a 38% reduction in the risk of ESKD (aHR=0.62; 95% CI:0.49-0.80; p<0.001). Conclusions and relevance: Changes in UACR at 12 months significantly modify the rate of decline of GFR over 24 months and clinically meaningful endpoints, supporting the use of UACR changes as surrogate endpoint in AMKD.
Coscini, N.; Giallo, R.; Grobler, A.; Hiscock, H.; Mulraney, M.; Pope, N.
Show abstract
Objectives To explore caregiver and clinicians perspectives on implementing mental health conversations and supports for caregivers of children with chronic conditions in paediatric outpatient clinics. Specifically, views were sought on (a) screening approaches and measures (phase 1) and (b) how feedback and support could be provided to caregivers experiencing mental health difficulties (phase 2). Methods Caregivers and clinicians from two outpatient clinics (neuromuscular and diabetes) at a tertiary paediatric hospital in Melbourne, Australia participated in online focus groups in July and August 2024. Caregivers were recruited from outpatient clinics and clinicians were recruited via email. Both groups were combined for phase 1 before separating into breakout rooms for phase 2. Two authors conducted reflexive thematic analysis of transcripts using NVivo. Results Sixteen participants (caregivers n = 8; and clinicians n = 8) took part in in two semi-structured focus groups. Analysis generated two overarching domains, each comprising multiple themes. Domain 1, Addressing caregiver mental health, captured themes of overwhelm and invisibility, diverse caregiving roles, and the need for time and resources to support wellbeing conversations. Domain 2, Housing the mental health conversation, encompassed themes of screening preferences, caregiver agency in confidentiality, delivery of feedback, and access to tailored supports. Conclusions Caregivers and clinicians support routine caregiver mental health discussions in paediatric outpatient settings. Caregivers favour screening at diagnosis and key transitions, with clear, and actionable feedback delivered away from the child. Questions about record-keeping warrant further exploration, as do the perspectives of fathers.
Landry, T. C.; Kim, Y.
Show abstract
Background. Capillary refill time is a resuscitation target in septic shock,1-4 but bedside measurement is examiner-dependent. An ICU monitor co-records a photoplethysmogram on the pulse oximeter and intermittent noninvasive blood pressure cuff cycles; if the probe and the cuff share a limb, each cycle is an unplanned vascular occlusion test on the distal microvascular bed. Standard practice places the two on opposite limbs. Objective. To measure how often, in MIMIC-IV-WDB v0.1.0, charted cuff cycles show the photoplethysmographic morphology expected of a same-limb cuff and probe, and to characterize the candidate capillary refill-like signal when that morphology is present. Methods. MIMIC-IV-WDB v0.1.05 was linked to the MIMIC-IV clinical database.6 A pre-registered rule-based detector identified candidate occlusion-reperfusion signatures on the 1-Hz perfusion-index envelope around each charted cuff timestamp. The primary endpoint was the proportion of cuff cycles suitable for analysis that were detector-positive at a 15-second reperfusion threshold, with 95% confidence intervals estimated by resampling patients at a fixed seed. A secondary analysis used a locally hosted multimodal language model (a Gemma-3 derivative on a non-device server) to adjudicate the same signature on perfusion-index plots; no MIMIC-IV-WDB content left the workstation. Results. Of 9,224 charted cuff cycles, 8,909 had a usable pulse-oximeter waveform, and 268 cycles in 15 patients (4.30% of the 6,236 cuff cycles suitable for analysis, 95% CI 2.60 to 6.03) met the primary 15-second threshold. The language model adjudicated the same cycles and called 1,367 of the 8,909 cycles with a usable waveform (15.34%) signature-present, roughly five times the detectors count. Because no laterality ground truth exists, agreement with a single blinded reader served as the comparator rather than accuracy. The two methods were about equally concordant with the reader: precision was 0.25 (95% CI 0.14 to 0.39) for the detector and 0.24 (95% CI 0.10 to 0.35) for the language model, although reweighting to the full population of cycles with a usable waveform lowered the language model to 0.030 (95% CI 0.009 to 0.053). These estimates are reference-limited: a blinded re-read of a 150-card subsample showed only moderate intra-rater reliability (Cohen {kappa} 0.46 to 0.59) with systematic undercalling on the first pass, and rescoring against the corrected re-read roughly doubled precision for both methods. Conclusions. Opportunistic extraction of capillary refill-like signals from archived ICU pulse oximetry is limited in two distinct ways. First, sensor geometry limits how often the signal is recordable: cuff cycles rarely show the morphology expected of a same-limb cuff and probe pair, consistent with opposite-limb placement, so the bottleneck is geometry rather than signal processing. Second, the modest reliability of morphology adjudication limits how well any single flagged cycle can be confirmed: against a blinded reader the detector is a usable screen but a noisy confirmer, the reference is itself only moderately reliable, and the language model is no more concordant despite flagging many more cycles. The minority of cycles in which the morphology appears contain a candidate signal that may merit prospective study under controlled placement with laterality recorded.
Taylor, A. R.; Foo, Y. S.; White, M. T.
Show abstract
Background: Reliable inference of Plasmodium vivax recurrence states - relapse, recrudescence and reinfection (the ``3Rs'') - improves estimates of antimalarial efficacy. The R package Pv3Rs features a Bayesian model designed for P. vivax molecular correction, i.e., using parasite genetic data to infer recurrence states. The model is an extension of a prototype built to analyse microsatellite data from the Vivax History (VHX) and Best Primaquine Dose (BPD) trials. Methods: We re-analysed data from 212 VHX and BPD trial participants (493 recurrences) using Pv3Rs, comparing results with those from the prototype and with genetic relatedness estimated using Dcifer, a tool for estimating relatedness based on identity-by-descent. Posterior recurrence state probabilities were computed using both uniform and time-to-event priors: artificial but equal prior probabilities facilitate posterior interpretation, while time-to-event priors leverage all available information and enable re-computation of failure rates. Relatedness estimates were used to identify and correct instances of model misspecification. Results: The Pv3Rs model generated posterior probabilities for all recurrences and was able to jointly model data on all episodes per participant for 89% of participants, compared with 73% using the prototype. Recurrence state probabilities were broadly consistent across methods, though the Pv3Rs model elevated reinfection probabilities slightly. Relatedness estimates exposed various outliers consistent with half-sibling parasites and/or genotyping errors. Outlier correction impacted some per-participant failure probabilities, but reinfection-adjusted radical-cure failure rates of high-dose primaquine remained near 3%, in line with previous findings. Conclusion: Re-analysis of VHX and BPD P. vivax genetic data restates earlier reinfection-adjusted efficacy estimates. It demonstrates the increased computational capability and misspecification sensitivity of Pv3Rs, highlighting a need for careful analyses. Using relatedness-based diagnostics alongside model-based inference, we were able to harness the advantages of model-based inference and provide a framework for future P. vivax molecular correction.
Sajib, M. S.; Tanmoy, A. M.; Kanon, N.; Jui, A. B.; Islam, M. S.; Dola, N. Z.; Hossain, M. M.; Mobarak, R.; Shahidullah, M.; Hoque, M.; Ahmed, A. N. U.; Holmes, A. H.; Saha, S. K.; Saha, S.; Wan, Y.; Hooda, Y.
Show abstract
Background Healthcare-associated infections pose a major burden to neonatal health worldwide and remain difficult to track in low-resource hospitals because patient movement data and pathogen genomic data are rarely integrated into actionable transmission models. Existing approaches are often restricted to specific settings, highly structured electronic health records (EHRs), or analyses focused on either patient movements or pathogen characteristics alone. To address this gap, we developed PathoPath, an open-source integrative modelling platform, and evaluated its utility in a high burden paediatric hospital in Dhaka, Bangladesh. Methods PathoPath is an open-source R package that combines electronic health records with whole genome sequencing data to generate contact networks from direct and indirect contacts using minimal structured inputs. We retrospectively applied PathoPath to 373 cases of Klebsiella pneumoniae species complex (KpSC) infection identified in 2021 at the largest paediatric referral hospital in Dhaka, Bangladesh. Ward level patient movement trajectories were used to reconstruct contact networks, and genomic data from isolates from children <60 days were integrated to identify probable dissemination of bacterial clones and antimicrobial resistance plasmids. Findings PathoPath identified 750 direct contacts among 317 patients, forming 25 connected components, with the largest including 93 patients. KpSC infections were identified across 21 of 37 wards, with the neonatal intensive care unit accounting for 77.9% of all cases. Integration of genomic and network data distinguished sustained clustering of ST147 from multiple probable inter-clonal dissemination events involving IncFII plasmids carrying blaNDM-5 and/or blaOXA-181 within ST16. Four dominant sequence types accounted for 65.6% of sequenced isolates, and carbapenemase genes were detected in 95.8%. Interpretation PathoPath reconstructs hospital-wide contact networks and integrates them with pathogen genomics to map probable dissemination of pathogens and antimicrobial resistance using minimal structured clinical data. It could support more targeted infection prevention and control in hospitals where granular digital records are not available.
Wyber, R.; Zagler, J.; Liu, C.; Yadav, U. N.; O'Dwyer, Z.; Hart, K.; Chapman, K.; McGrady, L.; Kohn, A.; Winterfield, N.; Williams, D.; Watson, N.; Morey, K.; Pearson, O.
Show abstract
Aim: Healthy Heart Actions Right Time (HHART) is a multi-phased research project that seeks to identify, implement and evaluate strategies to connect community and clinical activities to reduce the burden of heart disease for Aboriginal and Torres Strait Islander people. The aim in Phase One was to identify priority activities for two participating services. Background: The ongoing effects of colonisation drive a disproportionate burden of heart disease for Aboriginal and Torres Strait Islander people. Clinical and community groups both have established strengths in reducing the risk of heart disease, but these are not always well connected. Methods: Using a case study methodology in two locations we partnered in a 12-month co-design process to identify priority activities to connect clinical and community activities. Findings: Three priorities emerged from the Phase One co-design process: (i) community-led gardening as a strategy to promote heart health through connection and healthy lifestyles; (ii) community days to increase engagement in heart checks and strengthen community-clinic relationship; and (iii) clinic-led development of culturally relevant education resources to promote clinician confidence and community heart health knowledge.
Garavito Jimenez, D. A.; Bello Angulo, D. E.; Mejia Lemus, L. T.; Chipatecua, D.; Fula, D. D.; Perez-Rubiano, S.; Martinez, F. L.; Bohorquez Pinzon, J. C.
Show abstract
Between 2024 and 2025, Colombia universalized the Electronic Health Invoice with embedded Individual Health Services Delivery Records (RIPS -- Registro Background Between 2024 and 2025, Colombia universalized the Electronic Health Invoice with embedded RIPS records (FEV-RIPS) as the standard for financial and clinical data exchange. ADRES -- the entity responsible for administering the resources of Colombia's General Social Security Health System -- faced the challenge of processing information from multiple heterogeneous sources generated by more than 55,000 healthcare providers. Health systems in high-income countries converge clinical-financial data in consolidated platforms; Colombia started from a fragmented architecture with incompatible historical sources, no cross-database standardization, and no centralized analytical infrastructure until 2023. Objective We describe the design, technical challenges of integrating heterogeneous data, and operational performance of the analytical infrastructure built by ADRES to centralize large-scale processing of Colombian health system information, and derive transferable lessons for health system resource administrators in Latin America facing equivalent digitalization mandates. Methods Technical-descriptive report based on operational metrics from the ADRES Azure/Databricks environment during January-November 2025. We report indicators of data volume, processing speed, computational capacity, concurrent use by functional group, and governance structure. The architecture integrates VPN connectivity with MinSalud, automated processing of multiple formats (XML, relational tables, flat files), and a medallion data lake (Bronze/Silver/Gold). Data quality challenges include structural inconsistencies across sources, coding incompatibilities (municipalities, dates, diagnoses), format heterogeneities in unstructured data, and absent technical documentation. Results The platform manages 21 catalogs, 1,183 tables, and over 110,645 million stored records, with cumulative production exceeding 1 trillion processed records. It executes queries on 100 billion records in ten seconds using clusters of up to 32 TB RAM and 4,096 vCPU. During September-October 2025, monthly query peaks reached 78,028 across eleven functional groups. Integration required Python/PySpark parsers for variable-depth XML, equivalence tables for incompatible municipality codes, cleaning routines for extreme dates used as nulls (1900-01-01, 9999-12-31), and transformation logic bridging classic RIPS and FEV-RIPS. The platform supported econometric analyses, judicial mandate responses, and public interactive dashboards. Conversational AI integration (Genie, Copilot) extends analytical access to users without SQL knowledge. Conclusions ADRES built in one year an analytical infrastructure that provides, to our knowledge, the first published documentation of the systemic technical challenges of integrating heterogeneous data sources in a middle-income social security health system. Centralizing health system information at national scale is technically feasible under public institutional constraints -- but requires solving cross-source standardization problems the implementation literature does not document with quantitative precision. The derived lessons are transferable to health system resource administrators in Latin America facing equivalent challenges.
Kalamkarian, A.; Pilkington, R. M.; Lynch, J.; Mittinty, M. N.; Malvaso, C.; Hawkins, K.; Pharo, H.; Beck, K.; Chittleborough, C. R.
Show abstract
Background: Whole-population linked administrative data platforms provide an opportunity to generate evidence on early life multidimensional disadvantage to inform resourcing and service provision to families with complex needs. Methods: We used individual-level de-identified data from nine administrative data sources included in the Better Evidence Better Outcomes Linked Data (BEBOLD) platform. The population included all children born in South Australia between 2004-2011 (n=143,083), and their parents. We described the prevalence and distribution of multiple disadvantages affecting children from the 12 months before birth to age 5. Eleven domains of parental disadvantage were created: economic, education, access to services, mental health, substance misuse, smoking during pregnancy, domestic and family violence, health, child protection contact, justice system contact, and death. We investigated the concordance of our measure with an area-level socioeconomic measure used in government reporting. Results: One in two children (48%) were exposed to at least one disadvantage domain, and one in seven (14%) were exposed to three or more domains before age five. Economic disadvantage was most prevalent, affecting one in four (27%) children, of which 75% were exposed to additional forms of disadvantage. Substance misuse, domestic and family violence, and justice system contact were the least likely domains to occur in isolation. Only 54.4% who experienced five or more disadvantage domains were classified in the area-level socioeconomic measure's 'most disadvantaged' quintile. Conclusion: Early life exposure to parental disadvantage can be highly multidimensional. Measurement across different systems is important for informing coordinated service provision for families with complex needs.
KESOZI Digital Twin, ; Agumba, J. O.; Namusonge, L.; Ogendo, J.; Hassan, M. A.; Pembere, A.; Takavarasha, M.
Show abstract
Childhood diarrheal disease remains a leading cause of morbidity and mortality among children under five years in sub-Saharan Africa, particularly in settings affected by inadequate sanitation, climate variability, malnutrition, and limited healthcare access. Conventional forecasting approaches are often constrained by sparse surveillance data, weak spatial representation, and limited incorporation of mechanistic disease dynamics. This study presents a Physics-Informed Multimodal Artificial Intelligence Digital Twin framework that integrates Physics-Informed Neural Networks, Graph Neural Networks, diffusion-reaction epidemiological modeling, multimodal fusion learning, and Digital Twin simulation to estimate and predict childhood diarrheal disease burden in Kenya, Somaliland, and Zimbabwe. Using public epidemiological, environmental, climate, sanitation, and synthetic proof-of-concept datasets, the framework modeled temporal disease dynamics, spatial transmission, pathogen-attributed burden, and outbreak trajectories while enforcing epidemiological consistency through physics-informed optimization. Results demonstrated robust forecasting performance, enhanced spatial transmission modeling, uncertainty-aware predictions, and realistic outbreak simulations across the three countries. Rotavirus, Shigella, and Cryptosporidium were identified as major contributors to modeled mortality burden, while unsafe water exposure, poor sanitation, malnutrition, and climate-sensitive transmission substantially increased disease risk. Compared with a Bayesian baseline model, the multimodal framework achieved superior nonlinear risk characterization, geospatial learning, and temporal prediction. These findings highlight the potential of scientific machine learning and digital twin systems for infectious disease surveillance, outbreak forecasting, climate-health analytics, and evidence-based public health decision-making in low-resource African settings. Keywords: Physics-Informed Neural Networks, Graph Neural Networks, Digital Twin, Childhood Diarrheal Disease, Epidemiology, Kenya, Somaliland, Zimbabwe, Scientific Machine Learning, Spatial Epidemiology, Multimodal Fusion
Vanbrabant, E.; Roefs, A.; Goossens, G.; Lemmens, L.; Shapovalova, Y.; Hesen, J.; Mironiuc, C.
Show abstract
Background: Obesity is globally recognized as a complex, multifactorial chronic disease, with biological, psychological, environmental and behavioural factors involved in both disease pathogenesis and maintenance. Although previous group-based studies demonstrated involvement of each of these factors, there is large inter-individual variability in the factors contributing to disease development as well as intervention outcomes, causing limited translatability to the individual level. This heterogeneity in treatment effectiveness might be due to differential causal and maintenance factors of obesity. To enable the transition from a one-size-fits-all approach to a more personalized approach for individuals with overweight or obesity, this study aims to investigate if and how the degree of weight loss and changes in daily life behaviour after a combined lifestyle intervention depend on individual baseline profiles comprising of person characteristics, biological, psychological, environmental and behavioural factors. Methods: This study will include 600 individuals varying in BMI, 200 participants with a healthy BMI (18.5-24.9kg/m2), 200 with overweight (BMI 25.0-29.9kg/m2), and 200 with obesity (BMI [≥]30.0kg/m2). For all participants, a comprehensive individual baseline profile is created, including person characteristics, biological, psychological, environmental and behavioural factors. A clustering method is applied to identify clusters of participants with similar characteristics. Next, we examine if and how these clusters are linked to bodyweight indicators measured at baseline, and how they relate to daily lifestyle behaviour, as measured by ecological momentary assessment (EMA) using a smartphone app and sensor technology (3-week measurements). Individuals with overweight or obesity will be randomized to the intensive lifestyle intervention or a lifestyle information condition, to determine if treatment response can be predicted based on cluster characteristics, how daily lifestyle behaviour changes after an intervention, and how changes in daily lifestyle behaviour relate to treatment response. Discussion: The End of Average study aims to characterize a large set of individuals varying in body weight to predict intervention effectiveness measured as changes in body weight indicators and in daily lifestyle behaviours. If reliable predictors of treatment success can be identified, these can be applied in personalized lifestyle interventions to improve lifestyle behaviour, body weight management and overall health.
Odeny, T. A.; Adhiambo, H. F.; Mangale, D.; Makanga, P. K.; Odeny, B.; Okuku, F.; Zhou, C.; Geng, E.; Carson, J.; Mudhune, V.; Bukusi, E.; Semeere, A.
Show abstract
Abstract Background: Kaposi sarcoma (KS) is the most common cancer among men in several Eastern African countries, yet treatment monitoring relies on imprecise, time-consuming ruler-based measurements defined by the AIDS Clinical Trial Group (ACTG). This method suffers from inter-observer variability, fails to capture lesion height or true geometric area, and performs poorly on dark skin. SkinScan3D (SS3D) is a portable, low-cost, AI-enabled 3D imaging device that provides objective measurements of KS skin lesion area, height, volume, and color. The Precision Imaging to Evaluate Kaposi Sarcoma (PRIME-KS) study evaluates whether SS3D provides more reproducible and accurate lesion measurements than the standard method, and validates its integration into routine clinical workflows in Kenya and Uganda. Methods: PRIME-KS is a multicountry prospective mixed-methods study with two clinical objectives. Objective 1 is a cross-sectional diagnostic accuracy study comparing SS3D with ruler-based measurement in 50 adults with KS (150 lesions) across sites in Kenya and Uganda. Two clinicians independently measure three lesions per participant using both methods. The primary outcomes are concordance correlation coefficient (CCC) for inter-rater reproducibility, and co-efficient of determination for accuracy. Objective 2 is a non-randomized before-and-after pilot study in 100 patients at three sites, evaluating device usability, acceptability, appropriateness, and feasibility using validated instruments, along with time-and-motion studies and activity-based micro-costing. Prior to these clinical objectives, a formative study used focus group discussions, discrete choice experiments, and human-centered design workshops to refine the SS3D device and protocols with end-user input. Discussion: PRIME-KS will provide the first rigorous evaluation of a 3D imaging device for monitoring KS treatment response in routine clinical settings. If SS3D demonstrates superior reproducibility and clinical utility, it could reduce unnecessary chemotherapy exposure and associated toxicities by enabling earlier, more objective assessment of treatment response. Trial registration: ClinicalTrials.gov NCT06898203, registered 27 March 2025. Pan African Clinical Trials Registry PACTR202603523439856. Keywords Kaposi sarcoma, SkinScan3D, 3D imaging, treatment monitoring, diagnostic accuracy, implementation science, usability, human-centered design, Kenya, Uganda